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Amendments to the Claims : 

This listing of claims will replace all prior versions, and listings, of claims in the 
application. 

1 . (Previously Amended) An apparatus comprising: 

a first processor to execute a main thread instruction stream that includes a delinquent 
instruction, wherein the first processor is to be associated with a first private cache; 

a second processor to execute a helper thread instruction stream that includes a subset of 
the main thread instruction stream, wherein the subset includes the delinquent instruction, 
and wherein the second processor is to be associated with a second private cache; 

a shared memory system coupled to said first processor and to said second processor; and 

control logic to push , responsive to a miss of requested data for the delinquent 
instruction in the second private cache associated with the second processor, the requested 
data to the first private cache associated with the first processor. 

2. (Previously Amended) The apparatus of claim 1 , wherein: the first processor, 
second processor and logic are included within a chip package, and wherein the 
shared memory system includes a shared cache. 

3. (Previously Amended) The apparatus of claim 1, further comprising retrieval 
logic coupled to the control logic to retrieve the requested data from the shared 
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memory system in response to the miss of the requested data for the delinquent 
instruction in the second private cache. 



4. (Previously Amended) The apparatus of claim 3, wherein: the control logic to 
push, responsive to a miss of requested data for the delinquent instruction in the 
second private cache, the requested data to the first private data cache of the first 
processor comprises the control logic, responsive to the miss of the requested data 
and the retrieval logic retrieving the requested data, to broadcast the requested 
data to an affinity group of processors including at least the first processor. 

5. (Previously Amended) The apparatus of claim 3, wherein: the control logic to 
push, responsive to a miss of requested data for the delinquent instruction in the 
second private cache, the requested data to the first private data cache of the first 
processor comprises the control logic, responsive to the miss of the requested data 
and the retrieval logic retrieving the requested data, to unicast the requested data 
to the first processor. 

6. (Previously Amended) The apparatus of claim 1 , wherein: the control logic is 
further to provide the requested data from the shared memory system to the 
second private data cache associated with the second processor. 

7. (Previously Amended) A processor comprising: 

a first processor core to be associated with a first private cache and to execute a 
first thread; 

Page 3 of 12 



Appl. No. 10/632,431 Attorney Docket : 42P15449 

a second processor core to be associated with a second private cache and to 
execute a second speculative thread, the second speculative thread to 
include a load instruction from the first thread that a miss to the first 
private cache is anticipated during execution of the first thread; and 

logic to retrieve data for the load instruction from a higher level memory and to 

provide the data to the first private cache associated with the first processor 
core in response to a miss to the second private cache responsive to 
execution of the load instruction in the second speculative thread with the 
second processor core. 



8. (Previously Amended) The processor of claim 7, wherein: the logic to retrieve 

data for the load instruction from a higher level memory and to provide the data to 
the first private cache comprises in response to a miss to the second private cache 
responsive to execution of the load instruction in the second speculative thread 
with the second processor core comprises the logic to 

retrieve the data in response to the miss to the second private cache 
responsive to execution of the load instruction in the second 
speculative thread with the second processor, and 
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broadcast the data to private caches associated with an affinity group of 
processor cores, the affinity group of processor cores including at 
least the first processor core. 
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9. (Previously Amended) The processor of claim 7, wherein: the logic to retrieve 
data for the load instruction from a higher level memory and to provide the data to 
the first private cache comprises in response to a miss to the second private cache 
responsive to execution of the load instruction in the second speculative thread 
with the second processor core comprises the logic to 

retrieve the data in response to the miss to the second private cache 
responsive to execution of the load instruction in the second 
speculative thread with the second processor, and 

unicast the data to the first private cache associated with the first private 
core. 

10. (Previously Amended) The processor of claim 7, wherein: the higher-level 
memory includes a memory coupled external to the processor, and wherein the 
first processor core is further to trigger the second processor core's execution of 
the second speculative thread responsive to execution of a trigger instruction in 
the first thread with the first processor core . 
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1 1 . (Previously Amended) An apparatus comprising: 

a first processor to execute a main thread instruction stream that includes a delinquent 
load instruction, wherein the first processor is to be associated with a first data 
cache; 

a second processor to execute a helper thread instruction stream that includes a subset of 
the main thread instruction stream, wherein the subset includes the delinquent load 
instruction, and wherein the second processor is to be associated with a second 
data cache; 

retrieval logic to retrieve, responsive to a miss of requested data for the delinquent load 
instruction in the second data cache, the requested data; and 

control logic to provide the requested data to the first data cache associated with the first 
processor in response to the retrieval logic retrieving the requested data and 
without a request from the first processor. 
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12. (Previously Amended) The apparatus of claim 11, further comprising: a third 

processor and a shared memory system coupled to said first processor, said 
second processor, and said third processor. 



13. (Previously Amended) The apparatus of claim 12, wherein: retrieval logic to 

retrieve, responsive to a miss of requested data for the delinquent load instruction 
in the second data cache, the requested data comprises: 

the retrieval logic, responsive to the miss of requested data for the delinquent load 
instruction in the second data cache, to issue a snoop for the requested 
data to a third data cache associated with the third processor, and 

the third data cache to provide the requested data to the retrieval logic in response 
to receiving the snoop issued from the retrieval logic. 
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14. (original) The apparatus of claim 13, wherein: the retrieval logic to retrieve, 

responsive to a miss of requested data for the delinquent load instruction in the 
second data cache, the requested data further comprises: 

the retrieval logic, responsive to the miss of requested data for the delinquent load 
instruction in the second data cache, to request the requested data from 
the shared memory system concurrently with issuing the snoop to the 
third data cache; and 

the retrieval logic to receive the requested data from the shared memory system in 
response to a miss of the requested data in the third data cache. 



15. (Previously Amended) The apparatus of claim 12, wherein: the first, second, and 

third processors each include a processor core on a single package, and wherein 
the shared memory system includes a shared higher-level cache. 
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16. (Cancelled) 

17. (Original) The apparatus of claim 11, wherein: the first processor is further to 
trigger the second processor's execution of the helper thread instruction stream 
responsive to a trigger instruction in the main thread instruction stream 
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1 8 . (Previously Amended) A method comprising : 

executing a main thread including a load instruction with a first core, the first core 
associated with a first private cache; 

missing a second private cache associated with a second core in response to executing 
the load instruction within a helper thread with the second core ; and 

prefetching load data for the load instruction into the first private cache responsive to 
missing the second private cache in response to executing the load instruction within the 
helper thread with the second core. 

19. (Previously Amended) The method of claim 18, wherein prefetching the load data 
for the load instruction into the first private cache further comprises: 

retrieving the load data from a shared memory system; and 

broadcasting the load data to an affinity group of cores including at least the first core in 
response to retrieving the load data from the shared memory system. 



20. (Previously Amended) The method of claim 18, further comprising: 

providing the load data for the load instruction from a shared memory system into the 
second private cache associated with the second core. 
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2 1 . (Previously Amended) The method of claim 18, further comprising: 
providing the load data for the load instruction from a shared memory system into a 

private cache for each of a plurality of helper cores including the second core. 

22. (Previously Amended) The method of claim 1 8, wherein prefetching the load data 
for the load instruction into the first private cache further comprises: 

retrieving the load data from a third private cache associated with a third core; and 

providing the load data to the private cache associated with the first core in response to 
retrieving the load data from a third private cache. 

23. (Previously Amended) The method of claim 1 8, wherein prefetching the load 
data for the load instruction into the first private cache further comprises: 
concurrently: 

broadcasting a request for the load data to each of a plurality of cores; and 
requesting the load data from a shared memory system. 
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24. (Previously Amended) The method of claim 23, wherein prefetching the load data 
for the load instruction into the first private cache further comprises: 

providing, if the load data is available in a private cache of one of the plurality of 
cores, the load data to the main core from the private cache of one of the plurality of 
cores ; and 

providing, if the load data is not available in a private cache of one of the plurality of 
cores, the load data to the main core from the shared memory system. 

25. (Previously Amended) An article comprising: 

a machine-readable storage medium having a plurality of machine accessible instructions, 
which if executed by a machine, cause the machine to perform operations comprising: 

determining that a helper core has suffered a miss in a helper private cache for a load 
instruction from a main thread executing on a main core while executing a helper thread; 
and 

fetching load data for the load instruction in response to determining the helper core 
has suffered the miss in the helper private cache for the load instruction; and 

pushing, unsolicited from the main core, the load data into a main private cache of the 
main core in response to fetching the load data for the load instruction. 
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26. (Previously Amended) The article of claim 25, wherein: fetching load data for the 

load instruction in response to determining the helper core has suffered the miss 
in the helper private cache for the load instruction comprises: retrieving the load 
data from a shared memory system in response to determining the helper core has 
suffered the miss in the helper private cache for the load instruction; and wherein 
pushing, unsolicited from the main core, the load data into a main private cache of 
the main core in response to fetching the load data for the load instruction 
comprises: broadcasting the load data to an affinity group of private caches of an 
affinity group of cores including the main private cache of the main core. 
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27. (Previously Amended) The article of claim 25, : fetching load data for the load 
instruction in response to determining the helper core has suffered the miss in the 
helper private cache for the load instruction comprises: retrievingthe load data 
from a shared memory system in response to determining the helper core has 
suffered the miss in the helper private cache for the load instruction; and wherein 
pushing, unsolicited from the main core, the load data into a main private cache of 
the main core in response to fetching the load data for the load instruction 
comprises: unicasting the load data to the main private cache of the main core. 

28. (Previously Amended) The article of claim 25, further comprising: a plurality of 
machine accessible instructions, which if executed by a machine, cause the 
machine to perform operations comprising: providing load data for the load 
instruction from a shared memory system into the private cache for each of a 
plurality of helper cores including the helper private cache of the helper core. 
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29. (Previously Amended) The article of claim 25, wherein: fetching load data for the 
load instruction in response to determining the helper core has suffered the miss 
in the helper private cache for the load instruction comprises: retrieving the load 
data from an additional helper private cache of an additional helper core. 

30. (Cancelled): 

31. (Cancelled) 

32. (Previously Amended) A system comprising: 
a memory system ; 

a first processor, coupled to the memory system, to execute a first instruction stream; 

a second processor, coupled to the memory system, to concurrently execute a second 
instruction stream; and 

helper threading logic to push fill data to a first private cache of the first processor, the 
fill data to be prefetched by the second processor and to be pushed to the first private cache 
before a request for the fill data is issued by the first processor. 
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33. (Cancelled) 



34. (Previously Amended) The system of claim 32, wherein: the helper threading 

logic is further to push the fill data to the first private cache of the first processor 
from a second private cache of the second processor. 



35. (Previously Amended) The system of claim 32, wherein: the helper threading 

logic is further to push the fill data to the first private cache of the first processor 
from the memory system. 



36. (Previously Amended) The system of claim 32, wherein the first processor and 

the second processor are each processor cores, and wherein the first processor 
core, the second processor core, and the helper threading logic are included in a 
single processor package coupled to the memory system. . 



37. (Previously Amended) The system of claim 36, wherein: the memory system 

includes a cache that is shared by the first and second processor cores coupled to a 
dynamic random access memory. 
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